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I»  Introduc t ion 


The  usual  imodels  for  distributed  databases  [RSL,BG]  are  based  on  a  set 
of  "entities"  distributed  among  the  nodes  of 

a  network.  These  entities  are  accessed  by  users  of  the  database  through 
"transactions",  which  are  certain  sequences  of  steps  ("actions")  involving 
the  individual  entities.  The  steps  are  grouped  into  transactions  for  two 
distinct  purposes.  First,  a  transaction  is  used  as  a  unit  of  recovery: 
either  all  of  the  steps  of  a  transaction  should  be  carried  out,  or  none 
of  them  should;  thus,  if  a  transaction  cannot  be  completed,  its  initial 
steps  must  be  "undone"  in  some  way.  Second,  a  transaction  is  used  to 
define  atomicity:  all  of  the  steps  of  transaction  form  a  logical 
atomic  unit  in  the  sense  that  it  should  appear  to  users  of  the  database 
that  all  of  these  steps  are  carried  out  consecutively,  without  any 
intervening  steps  of  other  transactions.  This  requirement  that  transactions 
appear  to  be  atomic  is  called  "serializability"  in  the  literature 
[EGLT,  RSL,BG] ,  and  has  been  widely  accepted  as  an  important  correctness 

criterion  for  distributed  databases. 

It  seems  to  me  that  these  two  purposes  should  not  be  served  by 
the  same  transaction  mechanism.  While  I  think  the  usual  notion  of  "transaction" 
is  adequate  for  purposes  of  recovery,  1  think  that  it  is  less 
appropriate  for  defining  atomicity.  Namely,  the  requirement  of 
serializability  is  so  strong  that  it  seems  to  exclude  efficient  implementation  of 
many  application  databases.  This  paper  suggests  superimposing  a 
new  mechanism  on  the  transaction  mechanism,  in  order  to  define  atomicity. 

The  model  I  use  in  this  paper  for  a  distributed  database  consists  of  two  completel 


distinct  levels  -  a  physical  level  consisting  of  node  processors  connected 


by  a  message  system  and  communicating  with  users  by  ports,  and  a  logical 
level  consisting  of  a  centralized  concurrent  application  database.  (The 
logical  level  does  not  involve  nodes,  messages,  or  any  other  distribution 
information.)  It  is  the  job  of  the  physical  system  to  "implement",  in 
some  appropriate  sense,  the  application  database. 

The  steps  of  different  application  database  transactions  might  be 
allowed  to  interleave  in  various  ways;  the  set  of  allowable  interleavings 
is  determined  by  the  application  represented.  At  one  extreme,  it  might  be 
specified  that  all  allowable  interleavings  be  serializable;  this  amounts 
to  requiring  that  the  application  database  be  a  centralized 
serial  database.  At  the  other  extreme,  the  interleavings  might  be 
unconstrained.  In  a  banking  database,  a  transfer  transaction  might  consist 
of  a  withdrawal  step  followed  by  a  deposit  step.  In  order  to  obtain  fast 
performance,  the  withdrawals  and  deposits  of  different  transfers  might  be 
allowed  to  interleave  arbitrarily,  even  though  the  users  of  the  banking 
database  are  thereby  presented  with  a  view  of  the  account  balances  which 
includes  the  possibility  of  money  being  "in  transit"  from  one  account  to 
another.  In  between  the  two  extremes,  there  are  many  other  reasonable 
possibilities . 

In  [FGL],  we  assume  an  application  database  allowing  any 
set  of  allowable  interleavings  of  transactions.  We  show  how  to  modify 
a  distributed  system  implementing  such  an  application,  so  that  it  has  an 
additional  capacity  to  determine  a  global  database  state, 

without  stopping  transactions  in  progress.  Consistency  of  such  a  global  data¬ 
base  state  can  be  checked,  and  repeated  use  of  this  capacity  can  also 
aid  in  recovery  from  inconsistent  global  states.  In  that  work,  any  set 
of  allowable  interleavings  can  be  assumed;  we  guarantee  that  if  the 
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original  distributed  system  only  produces  allowable  interleavings,  then  the 
modified  system  will  also  produce  only  allowable  interleavings.  Thus, 
a  global  state  can  be  obtained  for  application  databases  which  are 
serializable,  arbitrarily  interleaved,  or  anything  in  between  these  two 
extremes . 

In  this  paper,  only  certain  sets  of  interleavings 
are  considered.  The  intention  is  to  consider  only  sets  of 

interleavir gs  which  can  be  specified  in  a  way  which  is  suitable  for  use 
by  a  concurrency  control  algorithm.  At  the  same  time,  the  sets  of 
interleavings  considered  should  be  general  enough  to  allow  representation 
of  the  allowable  interleavings  for  important  application  databases  such 
as  those  for  banking. 

As  a  first  approximation  to  a  specification  method,  we  might  associate  with  each 
transaction  its  ’’atomicity"  (or  "granularity"  [GLPT]),  which  is  formally  described 
by  a  set  of  "breakpoints"  between  different  sets  of  consecutive 
steps.  Steps  not  separated  by  a  breakpoint  would  always 
be  required  to  occur  atomically,  (at  least  from  the  point  of  view  of  the 
system  users).  As  a  special  case  of  this  definition,  if  there  are  no 
breakpoints  for  any  transaction  except  at  the  beginning  and  end,  then 
this  requirement  is  simply  the  usual  requirement  of  serializability .  As 
another  special  case,  if  there  are  always  breakpoints  between  every  pair 
of  steps  of  each  transaction,  then  this  requirement  allows  arbitrary 
interleaving.  In  addition,  many  intermediate  cases  are  possible. 

However,  this  definition  does  not  seem  to  me  to  be  sufficiently 
general  to  express  all  commonly-used  constraints  on  interleavings.  For 
example,  consider  a  banking  system  with  transfer  transactions  as  described 


above.  Transfers  might  be  allowed  to  interleave  arbitrarily  with  each  other. 
However,  we  might  also  want  to  have  another  type  of  transaction,  an 
"audit  transaction"  [FGL],  which  reads  all  of  the  account  balances  and 
returns  their  total.  This  audit  transaction  should  probably  not  be 
allowed  to  interrupt  a  transfer  transaction  between  the  withdrawal  and 
deposit  steps,  for  then  the  audit  would  miss  counting  the  money  "in  transit". 
That  is,  the  entire  transfer  transaction  should  be  atomic  with  respect  to 
the  entire  audit  transaction.  Thus,  the  same  transfer  transaction  should 
have  one  set  of  breakpoints  with  respect  to  other  transfers,  and  another 
set  with  respect  to  audit  transactions. 

This  example  seems  to  be  representative  of  a  fairly  general  phenomenon: 
it  might  oe  appropriate  for  a  transaction  to  have  different  sets  of  break¬ 
points  with  respect  to  different  other  transactions.  That  is,  each  transaction 
might  allow  different  "views"  of  its  activity  to  different  other  transactions. 
Thus,  a  natural  specification  for  allowable  interleavings  might  be  in  terms 
of  the  "relative  atomicity"  of  each  transaction  with  respect  to  each  other 
transaction,  rather  than  just  in  terms  of  each  transaction's  (absolute) 
"atomicity" . 

In  this  paper,  a  formal  definition  is  given  for  a  type  of  relative 
atomicity,  called  "multilevel  atomicity".  The  two-level  model  for  distributed 
databases  is  described.  A  combinatorial  lemma  is  presented,  which  yields  a 
necessarv  and  sufficient  condition  for  achieving  multilevel  atomicity.  Some 
suggestions  are  made  for  using  this  condition  as  the  basis  for  a  concurrency 
control  design  for  multilevel  atomicity. 

Other  researchers  [L,GLPT,G,C]  have  also  noted  that  the  usual  notion 
of  serializability  needs  to  be  weakened.  In  particular,  [G]  contains 


interesting  preliminary  work  on  specification  and  concurrency  control 
design,  for  certain  non-serializable  interleavings.  The  multilevel 
atomicity  of  this  paper  is  a  generalization  of  the  two-level  atomicity 
described  in  [G]  under  the  designation  Compatibility  sets". 

Much  work  remains  to  be  done,  in  designing  and  evaluating  concurrency 
control  algorithms  for  multilevel  atomicity.  This  paper  merely  suggests 
some  preliminary  definitions  and  ways  in  which  they  might  be  used.  It 
remains  to  see  whether  new  concurrency  control  algorithms  which  achieve  mult 
level  atomicity  can  be  made  to  operate  much  more  efficiently  than 
existing  concurrency  control  algorithms  which  achieve  seriali^ability . 

It  also  remains  to  determine  whether  these  weaker  notions  than 
serializability  are  useful  for  describing  the  constraints  required 
for  real-world  database  applications. 
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l.  A  Model  for  Asynchronous  Parallel  Processes 

Both  the  application  databases  and  the  physical  systems  of  this  paper 
can  be  formalized  within  the  model  of  [LF]  for  asynchronous  parallel 
computation.  This  unified  model  allows  precise  description  of  distributed 
algorithms  as  processes  accessing  variables  (i.e.  either  shared  variables 
or  distributed  system  message  ports).  In  this  paper,  I  will  be 

informal.  Only  a  brief  description  of  the  model  is  provided;  the  reader 
is  referred  to  [LF]  for  a  complete,  rigorous  treatment. 

The  basic  entities  of  the  model  are  processes  (nondeterminis tic 
automata)  and  variables.  Processes  have  states  (including  start  states 
and  possibly  also  final  states),  while  variables  take  on  values .  An 
atomic  execution  step  of  a  process  involves  accessing  one  variable  and 
possibly  changing  the  process'  state  or  the  variable's  value  or  both. 

A  system  of  processes  is  a  set  of  processes,  with  certain  of  its  variables 
designated  as  internal  and  others  as  external.  Internal  variables  are 
to  be  used  only  by  the  given  system.  External  variables  are  assumed  to 
be  accessible  to  some  "environment"  (e.g.  other  processes  or  users) 
which  can  change  the  values  between  steps  of  the  given  system. 

The  execution  of  a  system  of  processes  is  described  by  a  set  of 
execution  sequences.  Each  sequence  is  a  (finite  or  infinite)  list  of 
steps  which  the  system  could  perform  when  interleaved  with  appropriate 
actions  by  the  environment.  Each  sequence  is  obtained  by  interleaving 
sequences  of  steps  of  the  processes  of  the  system.  Each  process  must 
have  infinitely  many  steps  in  the  sequence  unless  that  process  reaches 


a  final  state. 
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For  describing  the  external  behavior  of  a  system,  certain  information 
in  the  execution  sequences  is  irrelevant.  The  external  behavior  of  a 
system  of  processes  is  the  set  of  sequences  derived  from  the  execution 
sequences  by  ’'erasing"  information  about  process  identity,  change  of 
process  state  and  accesses  to  internal  variables.  What  remains  is  just 
the  history  of  accesses  to  external  variables. 

A  distributed  problem  is  any  set  of  sequences  of  accesses  to  variables. 

A  system  is  said  to  solve  the  problem  if  its  external  behavior  is  any 
subset  of  the  given  problem. 

In  this  paper,  the  technical  assumption  that  no  state 

can  be  both  a  start  state  and  a  final  state  is  required.  Also,  one 
general  definition  not  in  [LF]  is  required.  Namely,  if  S  and  S!  are  systems, 
then  S  is  a  subsystem  of  Sr  if  the  processes,  internal  variables  and 
external  variables  of  S  are  included,  respectively,  among  those  of  S' , 
and  the  internal  variables  of  S  are  initialized  exactly  as  they  are  in  S’. 


A 
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3.  Application  Databases 

My  notion  of  an  application  database  is  a  centralized,  concurrent 

M  ■ 

system  consisting  of  transactions  acting  on  entities,  together  with  a 
set  of  allowable  interleavings  of  the  steps  of  those  transactions.  This 
is  modelled  very  directly  in  the  model  of  Section  2:  transactions  are  simply 
formalized  as  processes,  while  entities  are  formalized  as  variables. 

More  precisely,  an  application  database  (S,A)  consists  of  a  system  S  of 
processes  (called  transactions) ,  together  with  a  subset  A  of  the 
execution  sequences  of  subsystems  of  S  (called  the  allowable  execution 
sequences),  such  that  the  following  two  conditions  are  satisfied. 

(a)  All  variables  of  S  are  internal  (i.e.  internal  to  the  system). 

(Jhey  are  called  entities .  This  assumption  says  that  the  entities 
are  only  accessed  via  the  transactions.) 

(b)  In  every  execution  sequence  e  in  A,  every  transaction  which  appears, 
eventually  appears  in  a  final  state.  (Thus,  all  transactions  are 
supposed  to  terminate.) 

This  definition  gives  a  very  general  notion  of  an  application  database. 
The  (indivisible)  steps  of  transactions  are  arbitrary  accesses  to  entities, 
not  necessarily  just  reading  or  writing  steps  (although  these  two  types 
of  steps  are  permissible  special  cases).  Transactions  can  branch 
conditionally:  for  example,  based  on  the  values  encountered  for  certain 

entities,  they  might  access  different  entities  at  later  steps.  This 
model  of  a  transaction  is  general  enough  to  include  most  others  in  the 
literature.  It  also  includes  some  other  notions  usually  regarded  as 
somewhat  different  from  ordinary  transactions:  the  ’’transact ions  with 
revoking  actions”  in  [G]  are  a  particular  type  of  nondeterminist ic 


transaction  in  the  present  model. 
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4.  Coherent  Partial  Orders 

I  want  to  show  how  to  describe  certain  sets  of  allowable  execution 
sequences.  In  this  section,  I  present  some  preliminary,  rather  abstract, 
definitions  involving  sets  and  partial  orders.  The  definitions  of  this 
section  are  given  at  an  abstract  level  since  they  will  be  used  for  a 
general  combinatorial  lemma  in  Section  7. 

I  first  describe  the  partitions  of  an  arbitrary  set  T  (to  be  thought 
of  as  a  set  of  transactions)  into  levels. 

A  k-nest,  II  =  (tt^ , .  .  .  ,7T^)  for  a  set  T  is  a  sequence  of  equivalence 
relations  on  T,  satisfying  the  following  conditions: 

(a)  consists  of  exactly  one  equivalence  class, 

(b)  consists  of  singleton  equivalence  classes,  and 

(c)  Each  n\  is  a  refinement  of  its  predecessor,  it 

If  t,  t'  e  T,  then  level^(t,tf)  is  the  largest  i  for  which  t  t!. 

Next,  I  describe  an  abstract  "breakpoint"  function  which  defines 
a  set  of  breakpoints  within  a  totally  ordered  set  for  each  of  several 
"levels",  in  such  a  way  that  the  higher  level  sets  of  breakpoints  always 
include  the  lower  level  sets.  Each  totally  ordered  set  should  be 
thought  of  as  the  set  of  steps  of  some  execution  sequence  of  a  particular 
transaction. 

If  X  is  totally  ordered  by  <,  k  e  IN  ,  then  a  k-level  breakpoint  function, 
b,  for  (X,<)  assigns  a  set  of  pairs  of  <-consecutive 
elements  of  X  to  each  i,  1  <  i  <  k,  in  such  a  way  that: 


i 
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(a )  h(l)  contains  no  pairs, 

(b)  b(k)  contains  all  pairs,  and 

(c)  b(i)  c  b(i+l)  ,  for  all  i. 

If  T  is  a  set,  then  a  k-level  interleaving  specif icat ion  ,  Z,  for  T  has  the 
following  components: 

(a)  a  collection  of  disjoint  totally  ordered  sets,  one  for  each 

t  f  T,  and 

(b)  a  collection  of  k-level  breakpoint  functions,  b  ,  one  for  each 

<VV- 

Next  I  define  an  important  condition  for  a  partial  order  on  U  X  . 

tel 

I  want  to  express  the  fact  that  ^  preserves  all  of  the  individual 

orderings  and  also  respects  the  restrictions  expressed  by  the  given  collection 
of  breakpoint  functions. 

Let  II  be  a  k-r.est  for  a  set  including  T,  I  =  (((xt»  't>  »bt> :  teT*  a 

k-lcvel  interleaving  specification  for  T,  -  a 

partial  order  on  (J  X  .  Then  <  is  coherent  (for  n  and  I) 
trT  1 

provided  the  following  two  conditions  hold. 

(a)  The  partial  order  *  contains  each  partial  order 

(b)  Assume  lcvel^Ct , t ' )  =  i.  Assume  a,  a'  e  Xfc  and  a  <t  a' .  Assume 

a,  .  ,  and  a  <  8.  Finally,  assume  there  is  no  pair  (Y,Y*)  <-  bt(i) 

with  .i  •  Y  and  y'  t  a '  •  Then  a'  s  8. 
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Intuitively,  this  latter  condition  says  the  following.  If  a  step, 

B »  of  one  transaction  follows  a  step,  a,  of  another  transaction,  t, 
then  6  also  follows  any  other  step,  a1,  of  t  which  follows  a  but  precedes  any 
breakpoints.  (Here,  "follows"  means  follows  in  the  partial  order  ^M.)  The 
breakpoints  are  defined  solely  by  the  nesting  level  i  for  the  two 
transactions,  t  and  tr. 
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5.  Multilevel  Atomicity 

The  definitions  of  this  section  deal  explicitly  with  a  system  S  of  transactions 
I  use  the  abstract  definitions  in  the  preceding  section  to  help  describe 
sets  of  allowable  execution  sequences.  Intuitively,  transactions  are 
grouped  in  nested  classes  so  that  for  each  t,  the  set  of  places 
where  a  transaction  t1  can  interrupt  t  is  determined  solely  by  the  smallest 
class  containing  both  t  and  tf.  Moreover,  smaller  classes  determine  at 
lease  all  of  the  breakpoints  determined  by  containing  classes  (and 
possibly  more)-  This  says  that  transactions  which  are  grouped  in  a  common 
small  class  might  have  many  relative  breakpoints  (i.e.  can  interleave  a 
great  deal) ,  while  transactions  which  are  only  grouped  in  a  common  large 
class  might  have  fewer  relative  breakpoints  (i.e.  cannot  interleave  very 
much) . 

For  each  pair  of  transactions  t  and  t',  I  must  describe  the  places 
at  which  t  is  permitted  to  be  interrupted  by  steps  of  t*.  Since  the 
transactions  need  not  be  straight-line  programs,  but  can  branch  in 
complicated  ways,  I  am  forced  to  describe  separately  the  places  at 
which  each  different  execution  sequence,  e,  of  t  can  be  interrupted 
by  steps  of  t f . 

A  k-level  breakpoint  specification,  8,  for  a  system,  S,  of  transactions 

is  a  family,  {b  :  t  is  a  transaction  of  S,  e  an  execution  sequence  of  t}, 
t ,  e 

where  each  b^  ^  is  a  k-level  breakpoint  function  for  the  steps  of  e, 

totally  ordered  according  to  their  occurrence  in  e.  (Formally,  the 
elements  of  the  ordered  set  jf  steps  are  pairs  (i,£^),  where  ^  is  the 


i th  step  of  e .  ) 
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A  k-nest,  FI,  for  Che  transactions  of  a  system  S  and  a  k-level  breakpoint 

specification,  6,  for  S  can  he  used  in  a  st raight f orwa rd  way  to  define  an 
application  database,  (S,A(H,8)).  Namely,  let  e  be  an  execution  sequence 
of  a  subsystem  of  S,  (i.e.  an  execution  of  some  of  the  transactions  of  S) , 

and  T  the  set  of  transactions  appearing  in  e.  For 

each  t  e  T,  let  denote  the  execution  sequence  of  t  occurring  as  a 

subsequence  of  e,  the  set  of  steps  of  t  occurring  in  e^,  the  order 

in  which  those  steps  occur  in  e,  and  let  b  denote  b  e  8.  Let  *" 

t  t,et 

|i 

denote  the  total  order  on  X^_  describing  the  order  in  which  all  the 
steps  occur  in  e.  Then  e  is  multilevel  atomic  (for  II  and  8)  provided  ■ 

is  coherent  for  FI  and  I  =  {  ( (X^_ ,  <  )  ,bt )  :  t  e  T }  .  (This  definition  just 

says  that  all  the  interruptions  occur  at  the  given  breakpoints.) 

Let  A(n  8)  denote  the  set  of  execution  sequences  of  S  which  are 

multilevel  atomic  for  FI  and  8. 

For  example,  if  II  =  and  8  is  the  only  possible  breakpoint 

specification  (i.e.  no  pairs  for  b  (1),  and  all  pairs  for  b  (2)), 

t ,  e  t ,  e 

then  the  multilevel  atomic  execution  sequences  are  just  the  usual  serial  executions. 

The  reader  is  referred  to  [G]  for  treatment  of  a  special  case  of  our 
definition  corresponding  to  FI  =  ,7T2 ,TT3^  ’  where  b^  ^( 2 ) 

consists  of  all  pairs  of  consecutive  steps,  for  all  t  and  e.  That  is, 
transactions  in  a  common  rr^  class  can  interleave  arbitrarily,  but 

transactions  not  in  a  common  class  must  be  serialized  with  respect 

to  each  other.  The  "multilevel”  definition  of  this  paper  also  allows 
intermediate  degrees  of  interleaving  as  well  as  the  two  extremes 
represented  in  [ G ] . 
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6.  Simulation  of  an  Application  Database 

Having  described  the  logical-level  centralized  and  concurrent 
application  database,  1  now  must  describe  how  this  database  is  to  be 
"implemented"  by  a  distributed  system  (or  any  other  system).  The  physical 
system  implements  the  application  database  bv  presenting  an  external 
interface  to  the  users  which  is  compatible  with  allowable  executions  of 
the  application  database.  Correctness  for  the  physical  system  is  thus 
defined  entirely  in  terras  of  its  external  behavior.  The  physical  system 
might  produce  this  behavior  by  many  different  methods.  For  example, 
it  might  centralize,  distribute  or  replicate  the  entities.  It  might 
implement  each  transaction  on  one  processor  which  communicates  with  other 
processors  in  order  to  access  entities.  Alternatively,  it  might  divide 
up  the  entities  among  the  nodes  of  a  network,  and  allow  transactions 
to  "migrate"  from  entity  to  entity  as  necessary,  executing  some  of 
their  steps  on  different  processors.  It  is  only  the  external  view  which 
determines  correctness. 


A  definition  for  implementation  follows.  Fix  an  application  database 
(S,A).  Define  a  finite  nonempty  set  of  variables  called  ports ,  each 
of  which  can  contain  a  finite  set  of  transaction  status  words:  a 
transaction  status  word  is  a  pair  (t,s)  where  t  is  a  transaction  of  S 
and  s  is  either  a  start  state  or  a  final  state  of  t.  Let  a  be  a  sequence 
of  access  to  ports,  each  access  tagged  by  the  label  "users"  or  "system" 

(to  indicate  who  is  doing  the  accessing).  Then  a  is  syntactically 
correct  provided.  In  a,  the  following  conditions  hold. 


15 


(a)  Each  port  starts  out  empty,  and  each  successive  access  to  a 
port  begins  with  the  same  value  left  at  the  end  of  the  preceding 
access  to  that  port. 

(b)  The  changes  of  port  values  are  all  of  the  following  types.  The 
users  can  initiate  a  transaction  t  at  any  time  by  inserting  a  pair 
(t,s)  into  a  port,  where  s  is  a  start  state  of  t.  The  system 

can  change  (t,s)  to  (t,sf),  where  s  is  a  start  state  of  t  and 
s’  is  a  final  state  of  t. 

(c)  Each  transaction  is  initiated  at  most  once. 

(This  is  a  technical  convenience,  assumed  for  the  sake  of 
consistency  with  the  formal  model  of  [LF] .  If  the  same  transaction 
is  intended  to  be  run  twice,  it  is  simply  duplicated.) 

(d)  Each  transaction  which  is  initiated  by  the  users  is  subsequently 
completed  by  the  system. 

It  remains  to  express  the  semantic  requirement  that  a  provide  the 
users  with  results  "consistent  with"  an  allowable  execution  sequence  of 
the  application  database. 

Let  a  be  a  syntactically  correct  sequence,  e  an  execution  sequence  of 
a  subsystem  of  S.  Then  a  is  consistent  with  e  provided  exactly  the  same 
transactions  appear  in  a  and  e,  with  the  same  start  states  and  same 
final  states.  A  sequence,  a,  is  correc-t  for  the  users  and  system  together 
provided  a  is  syntactically  correct  and  consistent  with  some  e  in  A. 

I  need  a  definition  of  correctness  for  the  system  alone.  Informally, 
a  system  execution  sequence  is  "correct"  if  whenever  it  is  run 


with  a  "correct  user",  the  result  is  correct  for  the  users  and  system 
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together.  In  a  little  more  detail,  a  sequence  of  accesses  to  ports  is 

correct  for  the  users  provided  all  changes  made  are  among  those  allowed 

for  the  users  in  (b)  and  (c)  above.  (That  is,  the  users  can  only  initiate 

transactions,  cannot  retract  a  transaction  once  it  is  initiated,  and  cannot 

initiate  the  same  transaction  more  than  once.)  Then  a  sequence  is  correct  for  the 
system  provided  that  whenever  it  is  interleaved  consistently  with  a 

correct  user  sequence  (and  the  steps  of  the  resulting  sequence  labelled 

appropriately),  the  result  is  correct  for  the  users  and  system  together. 

(The  interested  reader  is  referred  to  [LF]  for  a  completely  formal 

definition  for  this  interleaving.) 

A  system  of  processes  Sf  implements  application  database  (S,A)  provided 
all  external  behavior  sequences  of  Sf  are  correct  for  the  system. 

Thus,  I  use  a  weak  notion  of  implementation  which  simply  preserves 
input-output  results.  I  do  not  require  preservation  of  ordering  of 
transactions;  a  transaction  t  is  permitted  to  complete  (at  a  port)  before 
another  transaction  tf  is  initiated  (at  a  port)  and  yet  it  might  be  the  case 
that  some  of  the  steps  of  tT  precede  some  of  the  steps  of  t  in  all  execution 
sequences  of  the  application  database  consistent  with  the  port  behavior. 

The  weakness  of  the  implementation  definition  allows  some  freedom  in 
design  of  the  physical  system.  In  particular,  for  any  execution 

sequence  e  of  a  system  S  of  transactions,  a  dependency  partial  order  g  of  the 
steps  of  e  is  defined  as  follows.  For  every  pair  of  steps  a,  T  in  e,  let 

a  i  if  a  precedes  T  in  e  and  either  (i)  a  and  t  are  steps  of  the 

same  transaction,  or  (ii)  a  and  x  are  steps  accessing  the  same  entity. 

Then  everv  total  order  of  the  steps  of  e  consistent  with  <  is  also  an 

e 

execution  sequence  of  S,  having  the  same  sequence  of  values  for  each 
entity  and  the  same  execution  subsequence  for  each  transaction,  as  e. 
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Two  execution  sequences,  e  and  e?  of  S  are  equivalent 

if  <  is  identical  to  <f.  It  follows  that  if  a  sequence,  a,  of  port 
e  e 

accesses  is  syntactically  correct  and  consistent  with  an  execution  sequence,  e, 
which  is  equivalent  to  some  e1  in  A,  then  a  is  correct. 

Example.  If  A  is  the  set  of  "serial"  executions  of  the  transaction 
system,  then  "equivalence  with  some  e  in  A"  amounts  to  the  usual 
definition  for  "serializable"  executions.  If  a  physical  system  guarantees 
that  its  port  behavior  is  consistent  with  a  serializable  execution  sequence, 
then  it  is  also  consistent  with  a  serial  execution  sequence. 

Example.  A  popular  model  for  distributed  databases  is  the  "migrating 
transaction"  model  described  in  [RSL] .  In  this  model,  entities  of  the  database 
reside  at  nodes  of  a  network  of  processors,  and  the  transactions  migrate 
from  entity  to  entity  as  necessary,  executing  some  of  their  steps  on 
different  processors.  In  more  detail,  a  transaction  t,  with  start  state  s, 
originates  at  a  processor  o.  A  message  (o,t,s)  is  sent  to  the  processor 
owning  the  entity  which  t  accesses  when  it  is  in  state  s.  A  processor 
receiving  a  message  (o,t,s)  "performs"  the  indicated  step  by  changing 
the  value  of  the  entity,  updating  trs  state,  and  sending  a  new  message 
(o,t,s'),  where  sf  is  the  new  state.  If  s'  is  not  a  final  state,  the 
message  is  sent  to  the  processor  owning  the  appropriate  entity.  If  s' 
is  a  final  state,  the  message  is  sent  back  to  the  originator  o.  In 
this  way,  an  execution  sequence  e  of  the  system  of  transactions  is 
actually  "performed"  by  the  processors.  (The  total  order  of  the  sequence 
is  determined  by  real  clock  time.)  This  execution  sequence  is  constructed 
to  be  consistent  with  the  port  behavior  of  the  system.  It  suffices  for 
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external  port  correctness  to  insure  that  the  execution  sequence  e 
"performed"  by  the  processors  is  one  which  is  equivalent  to  some  e1 
in  A. 

Now  consider  the  case  in  which  A  is  a  set  of  multilevel  atomic  sequences;  that 

is,  assume  that  H  is  a  k-nest  for  the  transactions  of  S,  6  =  (b  :  t 

t  >  o 

is  a  transaction  of  S,  e  an  execution  sequence  of  t}  is  a  k-level 
breakpoint  specification  for  S,  and  A  =  A(U,8).  We  say  that  an  execution 
sequence  e  of  S  is  totally  coherent  (resp.  partially  coherent)  for  II  and 
8  provided  the  dependency  partial  order  g  is  extendable  to  a  total  order 

(resp.  partial  order)  which  is  coherent  for  II  and  I  =  { ((X  ,<t) ,bt : t  e  T}, 
where  denotes  the  execution  sequence  of  t  occurring  as  a  sub¬ 

sequence  of  e,  and  b  denotes  b  .  By  definition,  an  execution  sequence 

t  t,et 

e  of  S  is  equivalent  to  one  which  is  multilevel  atomic  for  H  and  8  if  and 
only  if  e  is  totally  coherent  for  JI  and  8.  Thus,  it  suffices  to  insure 
that  each  sequence  of  port  accesses  is  consistent  with  some  totally 
coherent  execution  sequence  of  S.  In  particular,  if  the  migrating 
transaction  model  is  used,  it  suffices  to  insure  that  the  execution  sequence 
"performed"  by  the  processors  is  totally  coherent. 

Note  that  "totally  coherent"  generalizes  "serializable"  in  the  same 
sense  that  "multilevel  atomic"  generalizes  "serial". 

Tt  is  not  immediately  obvious  how  a  concurrency  control  might  insure 
total  coherence.  Some  help  is  provided  by  the  lemma  in  the  next  section. 


19 


7.  A  Combinatorial  Lemma 

In  this  section,  I  state  and  prove  a  combinatorial  lemma  which  will 
be  used  in  the  next  section  to  derive  a  necessary  and  sufficient  condition 
for  multilevel  atomicity.  The  lemma  requires  only  the  abstract  definitions 
in  Section  4. 

For  this  section,  let  T  be  a  fixed  set,  let  H  =  (tt^  , .  .  .  ,  tt^)  be  a  fixed 

k-nest  for  a  set  including  T,  and  let  I  =  { ( (X^_ ,  <  )  ,b^)  :  teT}  be  a  fixed  k-level 

interleaving  specification  for  T.  Let  "coherent"  mean  '’coherent  for  17 
and  I",  and  write  "level”  for  "level^.". 

Lemma.  If  <  is  a  coherent  partial  order,  then  there  is  a  coherent  total 
order  <*  which  contains  <. 

Proof .  Let  denote  <.  A  sequence  of  stages  numbered  2,...,k  is  carried 

out.  Each  stage,  i,  inserts  additional  pairs  into  the  ordering 

relation,  yielding  Then  <’  iS  defined  to  be  It  is  shown, 

inductively  on  i,  1  <  i  <  k,  that  (a)  is  a  coherent  partial  order, 

and  (b)  if  a  e  X  ,  6  €  X  ,  and  level(t,tT)  <  i,  then  a  and  6  are 

s ^ -comparable.  Conditions  (a)  and  (b)  are  trivially  true  for  i  =  1. 
Conditions  (a)  and  (b)  for  i  =  k  clearly  imply  the  needed  result. 

Stage  i  (2  <  i  <  k) . 

Partition  X  =  U  into  segments,  where  each  segment  S  is  a  maximal 

teT  t  - 

subset  of  some  X  with  the  property  that  there  are  no  pairs  in  b  (i-1) 

having  both  components  in  S.  (That  is,  each  X  is  divided  into  segments  at 
the  breakpoints  given  bv  b^Ci-l).) 
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Define  a  directed  graph  G  whose  nodes  are  all  the  segments.  G 
contains  an  edge  from  segment  to  segment  S?  exactly  if  there  exist 

a  S^,  3  S?  with  a  ^  3. 

Totally  order  the  strongly  connected  components  of  G, 

so  that  G  contains  no  edges  from  any  segment  in  S  to  anv  segment  in  5  , 

m  "  n 

n  <  m.  Then  define  by  adding  to  ^  all  pairs  (a, 6),  where 

ct  Sn  <'  $  »  S  t  S0  c  S  >  and  m  <  n,  and  then  taking  the  transitive 
1  m  2  n 


c Insure 


END 


1  now  prove  the  needed  properties  (a)  and  (b)  for  assuming 

that  they  hold  for 


Claim  1. 


(i) 


is  a  partial  order. 


Proof  of  Claim  1.  There  are  no  edges  in  from  a  c  S.  e  S  to 

-  1  m 

3  •  S9  r-  5  ,  where  n  <  m.  Also,  all  edges  in  not  in  ^ 

go  from  rt  »  S.  *  S  to  B  t:  S,  c  S  ,  where  m  <  n.  Thus,  there  is  no  cvcle 
1  m  z  n 

in  involving  a  new  edge.  Since  ^  is  a  partial  order,  there 


are  no  cvcles  in 


(i) 


Claim  2.  ‘  is  coherent. 


Proof  of  Claim  2.  Assume  level (t,tf)  =  j,  a,  af  c  X  ,  and  a  '  ex'.  Assume 
8  Xfcf  and  a  3.  Assume  there  is  no  pair  (y,yf)  r  b  (j)  with 

a  y  and  y'  a’.  I  show  that  a*  ^  S.  The  result  is  trivial 

if  t  =  tT,  so  assume  that  t  ^  tf. 
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Case  1 .  a 


Then  the  coherence  of  ^  Implies  the  needed  result. 


Case  2.  a  1)  6 


aeS.  eS,8eS_eS 
1  m  2  n 


for  some  m  <  n. 


Since  a  B  and  contains  it  follows  that  B  ^  u, 

so  that  a  and  B  are  “^-incomparable.  Then  property  (b)  applied  to 
^ ^  implies  that  j  (=  level(t,t?))  >  i  -  1.  Then  b^d-l)  c  b  (j)  by 
the  definition  of  a  k-level  breakpoint  function.  But  includes  all 
elements  from  a  up  to  the  next  b^(i-l)  breakpoint  in  X^;  since  a  and 
a*  have  no  intervening  b ( j )-breakpoints ,  they  also  have  no  intervening 
b^  (i-l)-breakpoints,  so  that  a!  e  S  .  The  definition  of  then 

insures  the  needed  result. 


In  the  following,  a  segment  S  is  said  to  belong  to  an  element 
t  t  T  if  S  c  X  . 

Claim  3.  For  each  m,  the  following  holds.  If  S,  S’  f  S  ,  S  belongs  to 
-  m 

t  and  S’  belongs  to  tf,  then  t  tt  t*. 


Proof  of  Claim  3.  If  not,  then  some  S  contains  a  cycle 

S  fS1>...tSjl  -  of  segments  such  that  for  each  j,  0  <  j  s'  £-1, 

there  exist  a  c  S.,  Q  S  with  a  ^  S  and  such  that  two  of  the 

J  J+I 
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segments  belong  to  tt  -  inequivalent  elements  of  T. 

Let  S  and  S'  be  two  distinct  segments  in  this  cycle,  belonging  to 
elements  t  and  t'  respectively,  where  (i)  t  tG  tf,  and  (ii)  any 

segment  S"  following  S  and  preceding  S’  in  the  cycle  belongs  to  some 
t"  which  is  7!\-equivalent  to  t.  Then  if  a  is  the  ^ast  (in  the  ^-ordering) 

element  of  s  and  8  is  the  last  (in  the  f -order ing)  element  of  S’, 

ve  claim  that  a  ^  8.  This  is  shorn  by  induction  on  the  number  of 
segments  following  S  and  preceding  S’  in  the  cycle. 

(i--»  \ 

Inductive  Step.  There  exists  a'  e  S  such  that  a’  <  ^ '  gT,  where  8f 

is  the  last  step  of  the  cycle-successor  of  S.  By  inductive  hypothesis  (or 
trivially,  if  S'  itself  is  S's  cycle  successor),  it  follows  that 

S'  <(l_1)  £.  Thus,  a’  <(l_1)  6.  Now,  j  =  level (t , t ' )  <  i  -  1,  by 
assumption,  so  bt(j)  c  b^i-l).  But  a  precedes  the  next  bt(i-l) 

breakpoint  following  a’ ,  so  a  also 

precedes  the  next  b^(j)  breakpoint  following  a'.  Coherence  of 
'  ^  ]/  implies  that  a  ^  8. 

Applying  this  result  repeatedly  around  the  cycle  shows  that  there 
are  two  distinct  segments,  S  and  S',  such  that  a  ^  8  and  8  ^  a, 

where  ot  and  ?  are  the  last  steps  of  S  and  S’  respectively.  Eut  this 
contradicts  the  assumption  that  ^  is  a  partial  order- 

G 
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Claim  4.  If  a  e  X  ,  3  €  X^_ ,  and  level (t,t')  <  i,  then  a  and  8  are 

,  (i)  - 

i  -comparable. 

Proof  of  Claim  4.  By  Claim  3,  t  and  t*  do  not  have  any  segments  in  the 
same  strongly  connected  component  S^.  Thus,  a  £  £  S^,  8  e  S?  c  S^, 

and  m  ^  n.  But  then 
and  to  contain  (8,0t)  if  n 


is  defined  to  contain  the  pair  (a, 8)  if  m  <  n, 
<  m. 


j 


r 
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8.  A  Necessary  and  Sufficient  Condition  for  Multilevel  Atomicity 

The  lemma  of  Section  7  is  now  used  to  restate  the  correctness 
condition  at  the  end  of  Section  6.  Namely,  assume  that  H  and  B  are 
as  at  the  end  of  Section  6.  Then  an  execution  sequence  e  is  equivalent 
to  one  which  is  multilevel  atomic  for  II  and  B  if  and  only  if  e  is  partiall 
coherent  for  .7  and  6.  Thus,  it  suffices  to  insure  that  each  sequence 
of  port  accesses  is  consistent  with  some  partially  coherent  execution 
sequence  of  S.  In  particular,  if  the  migrating  transaction  model  is 
used,  it  suffices  to  insure  that  the  execution  sequence  e  "performed"  by 
the  processors  is  partially  coherent  for  IT  and  8.  In  other  words,  e 
must  have  a  dependency  partial  order  which  is  extendable  to  a  partial 
order  which  is  coherent  for  H  and  I  (where  T  is  defined  as  at  the  end 
of  Section  6). 
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9.  Concurrency  Control  for  Multilevel  Atomicity 

In  this  section,  I  discuss  how  a  concurrency  control  mechanism 
might  take  advantage  of  some  of  the  preceding  ideas.  I  want  to  design 
concurrency  controls  which  use  the  correctness  conditions  stated  in 
Section  8.  Specifically,  I  use  the  migrating  transaction  model, 

and  consider  how  to  insure  that  any  execution  sequence  e  "performed"  by 
the  processors  has  a  dependency  partial  order  |  which  is  extendable 

to  a  coherent  partial  order. 

It  will  be  necessary  to  make  an  additional  assumption  about  a 
breakpoint  specif icat ion  for  the  application  database  (S,A).  Namely, 
in  order  to  be  able  to  determine  the  locations  of  breakpoints  while  the  execution 
sequence  e  is  being  performed,  it  is  necessary  to  assume  a  "compatibility" 

condition;  if  two  execution  sequences  of  a  transaction  share  a  common 
prefix  e,  then  either  both  execution  sequences  have  a  breakpoint 

immediately  after  e,  or  neither  does. 


In  order  to  insure  extendability  of  |  to  a  coherent  partial  order, 

consider  the  "smallest  possible"  coherent  extension 

of  .  This  can  be  defined  as  follows.  Given  a  set  T,  a  k-nest  H 
e 

for  a  set  containing  T,  a  k-Ievel  interleaving  specification 

I  -  {((X  ),b  ):tcT}  for  T,  and  a  partial  order  <  on  X  containing 

t  t  t  text 

all  the  ^ ,  define  the  coherent  closure  of  <  (with  respect  to  J1  and  7) 

to  be  the  partial  order  obtained  from  <  by  closing  under  condition  (b)  of 
the  coherence  definition.  Then  it  is  easy  to  see  that  g  is  extendable 
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to  a  coherent  partial  order  if  and  only  if  the  coherent  closure  of 

^  is  a  partial  order. 

e 

Assume  that  the  concurrency  control  generates  an  execution  sequence 

e  of  S,  and  that  the  concurrency  control  includes  some  priority  scheme 

and  rollback  mechanism  to  insure  that  no  initiated  transaction  gets 

blocked  indefinitely.  (Such  a  scheme  is  not  specified  here.)  I 

consider  how  to  insure  that  the  coherent  closure  of  <  is  a  partial  order. 

e 

One  possible  strategy  is  cycle-detection,  using  the  coherent  closure 
of  Namely,  if  the  concurrency  control  does  not  otherwise  guarantee 

that  c  is  extendable  to  a  coherent  partial  order,  the 

concurrency  control  might  generate  explicitly  the  edges  of  the  coherent 
closure  of  < ,  and  check  for  cycles.  If  a  cycle  is  detected,  a  priority 

scheme  can  be  used  to  determine  which  steps  should  be  rolled  back. 
Presumably,  fewer  cycles  would  be  detected  using  the  multilevel 
atomicity  definition  than  if  ser latizabil ity  were  required, 
leading  to  fewer  rollbacks. 

Another  approach  is  to  attempt  to  guarantee  that  the  coherent 
closure  of  <  is  a  partial  order.  One  way  of  doing  this  might  be  to 

delay  some  steps,  as  follows. 

Each  step  ft  first  gets  "scheduled”,  thereby  locking  its  entity  and 
delaying  its  transaction.  ft  does  not  actually  get  "performed"  until  it 
insures  the  following.  (Note  that  e  refers  to  the  order  in  which  steps 
actually  get  performed,  not  the  order  in  which  thev  are  scheduled.)  If 
is  the  initial  segment  of  e  ending  with  step  ft,  and  if  (t  is  the 
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last  step  of  transaction  t  which  precedes  8  in  the  coherent  closure  of 
v  ,  then  a  breakpoint  for  8*s  transaction  immediately  follows  a  in 

es 

the  execution  sequence  prefix  of  t  occurring  as  a  subsequence  of  e^, 

(This  can  be  accomplished  by  making  8  wait  until  suitable  breakpoints 
have  been  reached,  assuming  that  the  concurrency  control  uses  a  priority- 
rollback  mechanism  for  preventing  blocking.) 

If  the  property  above  is  guaranteed,  for  each  B>  then  the  coherent 
closure  of  |  is  consistent  with  the  total  ordering  of  steps  in  e,  so  it 

must  be  a  partial  order. 

Of  course,  there  are  still  many  difficulties  involved  in  designing 
a  priority-rollback  scheme  to  guarantee  that  no  transactions  block. 

Another,  related  difficulty  in  the  design  of  a  mechanism  for  allowing 
transactions  to  commit:  even  though  the  concurrency  control  guarantees 
eventual  performance  of  all  of  the  steps  of  a  correct  execution  sequence 
e,  it  does  not  necessarily  follow  that  the  concurrency  control  can  determine 
a  particular  point  in  time  when  each  transaction  can  no  longer  have  anv 
of  its  steps  rolled  back!  This  is  apparently  a  greater  difficulty  for 
multilevel  atomicity  than  it  is  for  ordinary  atomicity,  since  multilevel 
atomicity  allows  (even  if  there  are  only  a  finite  number  of  entities)  an 
infinite  chain  of  transactions  * t2* t3* ’ * “  suc^  that  for  each  there 

are  steps  nt  of  t,  and  3  of  t  . , ,  with  8  a.  This  means  that  it  is 
i  i+l  e 

quite  plausible  that  a  rollback  of  steps  of  can  cause  a  rollback  of 


steps  of  t^,  and  so  on. 
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10.  Further  Research 

Here,  I  have  really  only  suggested  a  new,  general  correctness 
criterion.  It  remains  to  "esign  detailed  concurrency  controls  based  on 
this  criterion,  in  ord  determine  if  the  generalization  can  be 

exploited  for  increased  efficiency. 
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